Goto

Collaborating Authors

 popular autonomous vehicle data


Roboflow: Popular autonomous vehicle data set contains critical flaws

#artificialintelligence

A machine learning model's performance is only as good as the quality of the data set on which it's trained, and in the domain of self-driving vehicles, it's critical this performance isn't adversely impacted by errors. A troubling report from computer vision startup Roboflow alleges that exactly this scenario occurred -- according to founder Brad Dwyer, crucial bits of data were omitted from a corpus used to train self-driving car models. Dwyer writes that Udacity Dataset 2, which contains 15,000 images captured while driving in Mountain View and neighboring cities during daylight, has omissions. Thousands of unlabeled vehicles, hundreds of unlabeled pedestrians, and dozens of unlabeled cyclists are present in roughly 5,000 of the samples, or 33% (217 lack any annotations at all but actually contain cars, trucks, street lights, or pedestrians). Worse are the instances of phantom annotations and duplicated bounding boxes (where "bounding box" refers to objects of interest), in addition to "drastically" oversized bounding boxes.